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Abstract 

There are multiple strategies for answering questions. For example, a statement is 
sometimes verified using a plausibility process, and sometimes by using a direct-retrieval 
process. It is claimed that there is a distinct strategy-selection phase and a framework is 
proposed to account for strategy-selection. Six experiments support the assumptions of 
the proposed framework. The first three experiments show that strategy-selection is under 
the strategic control of the subjects. These experiments also indicate what contextual 
variables affect ;his selection. Experiments 4 and 5 suggest that strategy selection also 
involves evaluating the question itself, while Experiment 6 suggests variables that 
influence the evaluation of the question. This model is shown to be consistent with 
processing strategies in domains other than question-answering, viz., dual-task monitoring 
in divided-attention situations. 
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In Norman's paper Memory, Knowledge, and the Answering of Questions (1973), he points 
out that the process of question-answering is far from simple and that the "traditional 
psychological studies of memory" do not tell us about the way that knowledge is used 
to answer questions. There has been considerable effort demoted to understanding 
memory and memory retrieval, as it relates to recognition and recall tests. There are 
formal theories of how to find specific facts in memory (e.g., Anderson, 1972; 1976; 
Atkinson & Shiffrin, 1968; Bower, 1972; Wntsch, 1970; Raaijmakers & Shiffrin, 1981; 
Ratcliff & Murdock, 1976). There has also been work on other ways of answering 
questions, such as searching one's autobiographical memory (e.g., Reiser, Black & 
Abeison, 1985; Whitten & Leonard, 1981; Williams & Hollan, 1981; Williams & Santos- 
Williams, 1980) and making plausibility judgments (e.g., Collins, I978a,b). However, 
there has been little work on whether people select strategies, and if so, how people 
decide which strategy or process to use in order to answer a question. 

Many question-answering models of memory include a preliminary stage where the 
subject performs an evaluation of the query to see if a quick decision can be made or 
if more work is required. For example, Norman (1973) pointed out that people do not 
search memory for the answer to the question, "What is Charles Dickens' telephone 
number?". Rather, some initial pre-processing allows us to decide that further search 
would be fruitless. A key assertion in this paper is that people's flexibility in their 
control of memo;, retrieval goes far beyond simply a decision whether to "fast exit." 
People have multiple strategies for retrieving information and choose to apply these 
strategies in variable orders. In a theory where there is variable strategy selection, it 
seems reasonable to propose that people's initial evaluation of their memory, with 
respect to the question, affects that strategy selection There have been a number of 
endeavors concerned with self-assessment of knowledge, (e.g.. Gentner & Collins, 1981; 
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Hart, 1965; Lachman & Lachman, 1980; Norman, 1973; Nelson, Gerler & Narens 1984) 
although this work has tended not to address strategy selection. 



This paper presents a general framework for the process of answering questions 
from memory. The paper focuses primarily on verification and recognition tasks, but not 
exclusively, and the framework is shown to generalize to other types of question- 
answering situations. There are a number of assumptions to the model proposed here. 
Before reviewing evidence for some of i!*a claims and presenting new data in support of 
additional hypotheses, it would be useful to highlight the critical ideas: 

1. There are multiple strategies for question-answering. 

2. One strategy is to try to find a fact which encodes the answer in memory. 
This is the direct retrieval strategy. 

3. Another strategy is to compute a plausible answer given a set of facts stored 
in memory. This is the plausibility strategy. 

4. Before answering a question, a person engages in an initial strategy-selection 
phase to decide which strategy or sequence of strategies to use. 

5. The strategy-selection stage consists of an Initial evaluation of knowledge 
relevant to the question followed by a decision of which strategy to follow. 
The initial evaluation is an automated process while the decision is a 
controlled process. 

6. In the initial evaluation, a person assesses how familiar the words in the 
question are. The more familiar the words, the more the person is biased to 
direct retrieval. 

7. In the initial evaluation, the person also assesses how many intersections in 
memory there are among the words from the question. The more 
in ^sections, the more the person is biased towards plausibility. 

8. The strategy decision process integrates information from the initial evaluation 
with factors extrinsic to the question in order to select a strategy. Extrinsic 
influences include instructions and probability that a particular strategy will be 
successful. 



After reviewing the evidence that strategy selection (or bias) is involved whenever a 
question is answered, the paper will go on to suggest the mechanisms that people use 
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for deciding quickly which strategy to apply. Understanding these mechanisms and the 
variables that influence them is the primary focus of this paper. 

Arguments in Support of Strategy Selection 
Searching Memory for a Specific Fact Is Not Always Preferred 

The default assumption of memory theorists has tended to be that when verification 
of a proposition is required, a careful search for a specific proposition is the first 
strategy tried (at least after an initial evaluation). 1 The reason that direct retrieval is 
assumed to be tried first is that it is also commonly believed that direct matching is a 
more efficient process than inferential reasoning, (e.g., Anderson, 1976; Anderson & 
Bower, 1973; Camp, Collins & Loftus, 1975; Collins & Quillian, 196S; Haviland & Clark, 
1974; Kintsch, 1974; Lachman, 1973; Lachman & Lachman, 1980; Lehnert, 1977; 
Norman. Rumelhart & the LNR Research Group, 1975; Quillian, 1968; Schank & Abelson, 
1977). Lachman and Lachman (1960) articulate this commonly held conception of the 
preference for one strategy over another: 

When a person needs a particular piece of information-e.g., to answer a 
question-she attempts to retrieve it directly. Metamemorial processes return the 
information that an answer is or is not in store. It an answer is found, 
metamemorial control processes are involved in assessing its adequacy. If no 
ar.swer, or an inadequate answer, is retrieved, then the process of inference is 
set into motion, (pp. 289-290) 

There is now a growing body of literature suggesting that searching memory for an 
exact match is not always done even in tasks that require such careful inspections (e.g., 
Erikson & Matson, 1981; Reder, 19C:, Reder & Ross, 1983; Reder & Wible, 1984). 
Erikson and Mattson asked subjects questions like "How many animals of each kind did 
Moses take on the Ark?". Subjects almost uniformly reply "two" even though ihey know 
that Noah took the animals on the ark. It seems in this case that people do not bother 
to carefully inspect their memories fcr exact matches to the memory probe. Their data 
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can not bfe explained by assuming that subjects accessed the correct memory trace, 
noted the discrepancy, but then obligingly gave the intended answer. If that were true, 
then subjects should not find it difficult to verbally note the discrepancy when specifically 
instructed to do so. In fact subjects have a great deal of difficulty with such a task: 
Reder and Dennler (in preparation) constructed a large number of these trick questions 
and told half the subjects to give trie answer only when tha question was presented in 
its correct form (i.e., answer when the question uses 'Noah', but say "can't say" when 
the question uses Moses') and told the other half of the subjects to give an answer 
based on the "gist* of the question (i.e., regardless of whether or not the question used 
"Moses"). Subjects were significantly faster and more accurate in the condition where 
they could ignore whether the question was properly formed or not. Subjects found it 
very difficult to say "can't say." It seems then that question-answering often proceeds 
by loose inspection of the data-base rather than by searching for one specific 
proposition. 

The robustness of the "Moses illusion" suggests that subjects rarely prefer a 
strategy that involves a careful match to memory of one specific fact; however, other 
data suggest the opposite. For example, Singer and Ferreira (1983) found that subjects 
were faster to answer questions that involved an exact restatement of a sentence read 
in a story or that involved inferences likely to be drawn during reading than they were to 
answer sentence paraphrases or inferences not required for story comprehension. The 
evidence that subjects tend to verify statements by searching for an exact match comes 
from experiments involving short delays between study and test. In situations where the 
delays are longer, there is evidence that strategy-preference changes from searching for 
exact matches to using a plausibility strategy (Reder, 1962; Reder & Wible, 1984). 
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Strategy-Preference Is Not Stable for the Same Questions in the 
Same Task 

In Reder (1982), subjects answered questions based on short stories they read. 
One group of subjects was required to decide whether a particular sentence had been 
studied, while the other group was to judge whether a particular statement was plausible 
given the story read (See Table 1 for an example story.) Half of the plausible test 
probes had been presented in the story (a different, random set for each subject). 
Although it might seem reasonable that the verbatim or direct-retrieval strategy would be 
used exclusively for the recognition task and the plausibility strategy for the plausibility 
task, the data indicated that subjects often tried first the strategy that corresponds to 
the other task. At short de'ays between reading of the story and test, subjects in both 
groups tended to prefer the direct-retrieval (or verbatim-match) strategy, while at longer 
delays, both groups tended to prefer the plausibility strategy. 

Insert Table 1 about here 



The evidence for use of both strategies in both tasks comes from the pattern of 
latencies and errors with respect to the plausibility of the test items. Items differed in 
their degree of plausibility, half highly plausible, half moderately plausible. Not 
surprisingly, it takes subjects longer to decide that a moderately plausible statement is 
plausible and they less consistently judge moderately plausible statements to be plausible 
tnan highly plausible. However, these plausibility effects obtain even when subjects are 
asked to recognize whether or not a fact had been stated In the story: moderately 
plausible statements were recognized more slowly than highly plausible; also moderately 
plausible statements were recognized less often than highly plausible statements, 
regardless of whether or not they had actually been presented (causing recognition 
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accuracy to be better for highly plausible presented statements and worse for highly 
plausible not-presented statements). These results are taken as evidence that sometimes 
statements are "recognized" by determining that they ar* plausible. 

There is also evidence of use of the direct-retrieval strategy in the plausibility task. 
Plausible statements that had not appeared in the story were judged implausible more 
often than when they had appeared in the story. Also, the size of the plausibility effects 
(difference in RT between moderately and highly plausible statements) was larger for 
probes that had not appeared in the story, and therefore could not have been verified 
by the direct-retrieval strategy. 

For those statements that had appeared in the story, there was less use of the 
direct-retrieval strategy at a delay. This evidence comes from the change in the size of 
plausibility effects with delay. Differences in RT between moderately and highly plausible 
statements increased with delay for presented statements. This suggests that people 
change strategy preference from direct-retrieval at short delays to the plausibility strategy 
at longer delays. 

The increase in error rates in the recognition task for not-stated items, especially 
for highly plausible statements, also indicates a shift in preference for the plausibility 
strategy with longer delays: At longer delays, subjects tend to prefer the plausibility 
strategy even in the recognition task. Highly plausible statements will most likely be 
judged plausible, causing an error for those statements that had not been presented in 
the story. 

Converging Results that Support a Strategy-Selection Stage 

The interpretation that strategy choice varies with the delay between reading the 
story and test suggests that people have a mechanism that allows them to select a 
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strategy for question-answering prior to executing that strategy. Below I review additional 
data that support a preliminary strategy-selection phase. In addition, new oxperiments will 
be reported that further strengthen the case for an initial selection phase. 

By assuming that people are able to select the strategy that they use in tasks 
such as question answering, a number of results are more easily interpreted. For 
example, an unusual finding from Reder (1982) was that subjects asked to mz\e a 
plausibility judgment were very slow to judge not-stated items as plausible, but only when 
the delay between reading and test was short. As the delay inr 'eased, subjects actually 
became faster than they were at shorter delays to judge not-stated items as plausible. 

The explanation given in Reder (1982) was that at short delays, the wrong strategy 
is tried first. The flowchart model displayed in Figure 1 repress*" thp probabilistic 
model offered in that paper. It represents the branching alternatives associated with 
judging an assertion, r egardless of whether the person was asked to make a plausibiMty 
judgment or a recognition judgment. Each branch (reflecting a choice patf, has a 
probability associated with it and was affected by variables such as the delay between 
reading the story and test, the official task requirements or the plausibiMty of the test 
probe. 

Insert Figure 1 about here 



The speed-up for not-presented plausible statements was explained by assuming 
that at short delays subjects tried the direct retrieval strategy first (right branch of tree) 
and when it failed, they went on to compute the statement's plausibility; at longer 
delays, they tried plausibility without first executing the useless strategy (left branch), 
thereby saving time. 
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Put another way, the claim is that people sometimes adopt as a first strategy one 
that is inappropriate for certain conditions. This can also explain a speed-up found in 
the data uf Reder and Wible (1984). That experiment required subjects to make either 
recognition judgments or consistency judgments about statements after having studied 
groups of thematicaliy related facts. Judging consistency meant deciding whether a 
probe was thematicaliy similar to a studied statement. At short delays, subjects were very 
slow to verify not-stated, consistent items in a consistency judgment task. At longer 
delays, they became faster and more accurate to verify this type of item, whle they 
became much slower to reject those items in a recognition task. Again, the explanation 
was that subjects preferred the direct retrieval strategy at short delays-an inappropriate 
strategy for not-stated, consistent items in the consistency judgment task. At longer 
delays, the direct retrieval strategy vm often avoided, saving time 'or not-stated items in 
the consistency judgment task, brt causing errors or slow responses for not-stated 
consistent items in the recognition task- 
Other factors that influence strategy selection 

The tendency to prefer the direct retrieval strategy or the consistency strategy was 
not just affected by delav between study and test. It was also influenced by whether 
the official task was to make recognition judgments or consis:<ncy judgments. Ever in 
situations where there are not explicit instructions, strategy-selection can be affected by 
impressions or expectations. For instance, Gould and Stephenson (1967) found that 
subjects' willingness to engage In reconstructive recall was affected by their perception 
of the emphasis on verbatim recall. 

Reder and Ross (1983) have shown that even within a recognition task, strategy 
preference can depend on the relation between the true and false assertions to be 
discriminated, i.e., subjects adjust their strategy-selection to reflect the difficulty of the 
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discrimination. In Reder and Ross, subjects studied related sets of facts about fictitious 
individuals, (e.g., Marty going to the circus), and had to discriminate studied sentences 
from non-studied foils. When these foils were thematically related to the studied facts 
(e.g., also about Marty at the circus), subjects tended to adopt the direct-retrieval 
strategy. WUw foils were unrelated to the circus theme, subjects tended to base their 
"recognition judgments" on a plausibility strategy. 

In a similar vein, Lorch (1981) showed that in a category-membership task, when 
unrelated items were used as foils, subjects tended to adopt a strategy that seemed 
consistent with the semantic overlap model ot Smith et aL When foils consisted of highly 
related and somewhat related terms, but no unrelated ternis, subjects tended to adopt a 
strategy of careful evaluauun of subjeu-predicate relations. 

It is worth noting that I am deliberately vague about the consequences cf strategy 
selection on the nature of the deployment of the competing strategies, e.g., does the 
preferred strategy execute first, by itself? in Reder (1982), I presented a modol in 
which subjects sequentially tried one strategy and then another; however, it was noted 
there that an equivalent model would involve a (parallel) race between the two strategies 
where subjr \s biased the amount r 1 *ve capacity given to each strategy. The 
data reported here do not dlscrimina^ oetween these two conceptions. The claim is 
more abstract: people differentially deploy resources (by parallel or serial execution) to 
stratenies as a function of an initial strategy selection decision. 

New Evidence for Controlled Strategy Selection: 
The Role of Situational Variaoles 

The data reviewed thus far seem easily explained if one assumes that people have 
the ability to decide how they want to go about answering questions. This assumption 
leaves open a number of questions. For example, dees a parson Jecide each time he 
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or she attempts a question which strategy will be preferred? Or do people select a 
preferred strategy to use throughout a task based on knowledge of instructions and 
similarity of foils to targets? How sensitive are people to the success rate of a 
particular strategy'? Can we quickly adjust strategy preference based on subtle features 
such as success with a strategy? 

This section describes experiments that are concerned vtith the extent to which 
people can fine-tune their control over what strategy they use for question-answering. It 
is also concerned with uncovering what factors extrinsic to the question affect strategy- 
selection. These experiments also provide further support that people can and do select 
a strategy prior to executing one for question-answering. 

Experiment 1: CAN WE ADJUST OUR STRATEGY PREFERENCE TO 
MIRROR THE RATIO OF PRESENTED TO NON-PRESENTED STATEMENTS 
IN A STORY? 

To what extent can people discern the effectiveness of a particular strategy and 
adjust the tendency to select a strategy on the basis of its perceived effectiveness? To 
address this question, subjects read short, mildly interesting stories and then were asked 
questions about them. After each story, subjects were asked to judge whether the test 
probe was plausible, given the storv. Unlike previous experiments of this type, the 
percentage of explicitly presented Inferences varied from the usual 50%. For half of the 
subjects, 80% of the plausible statements were presented in the stories, and for the 
other half of the subjects only 20% were presented. Of interest were whether the 
propensity to use a strategy was affected by the different ratios of presented to not- 
presented sentences, and how quickly subjects adjusted their strategies to adapt to 
these ratios. 
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Materials. Ten stories written by five different authors were used. The questions 
about the stories and the stories themselves had been used previously (Peder, 
1976; 1979; 1982). The questions were of three types: highly plausible, moderately 
plausible, and implausible. The plausibility of plausible statements had been determined 
by previous subject ratings. Half of the implausible statements were contradictions. 
(Contradictions had not been used in the previous studies.) Each contradictory 
statement was an exact restatement of a statement from the story except that one word 
was replaced by its opposite. Contradictory statements were considered * presented* 
implausible. The implausible and contradictory statements did not vary systematically on 
implausibility. They were constructed by the experimenter with the constraint that non- 
contradictions not refer to any specific statement In the story. 2 Table 1 gives an 
example story with implausible and contradictory statements. 

Design and Procedure. There were four factors in this experiment: whether the 
ratio of presented statements to non-presented statements favored the direct-retrieval 
strategy or the plausibility strategy, whether the statement itself was hignly or moderately 
plausible, whether the statement had actually been presented in the story and whether 
the bias was still present. This last factor was manipulated by the following design of 
the materials: The first six of the ter stories had either the 80/20 spilt or the 20/80 split 
of presented to not-presented plausible statements; however, after the first six stories, 
the remaining four stories always returned to the more conventional 50/50 split. 

The experiment was conducted On an IBM personal computer. Subjects read ten 
stories which were titled and presented in random order. For each story, subjects 
controlled the rate at which statements appeared on the screen; each time the space 



16 



Reder 14 

bar was pressed, a new statement appeared, so long as the previous statement had 
been on the screen for a minimum of 0.5 seconds. Subjects were tested after reading 
each story. Subjects were asked to decide whether or not each statement was 
plausible, i.e., consistent with the information in the story just road. Subjects were told 
to indicate that a statement was plausible or implausible by pressing the *K" or the "0" 
key, respectively. They were further instructed to keep their index fingers on these keys 
at all times while judging the statements, since response times would be recorded, and 
to respond as quickly as possible, without sacrificing accuracy. 

Subjects. College-age subjects who read ads on the Carnegie-Mellon and 
University of Pittsburgh campuses were recruited. Thirty subjects were randomly 
assigned to the Inference-bias Condition (i.e., only 20% of the plausibles were stated in 
the stories for the Hrst six stories) and 29 to the Direct-retrieval-bias Condition (i.e., 80% 
of the plausible probes had been presented in the stories). 

RESULTS 

Below in Table 2 are listed the mean response times to make plausibility judgments oss 
a function of four factors: the plausibility of the statement, whether the statement had 
been presented or not, the direction of bias (80% presented-biassing direct retrieval 
versus 80% not-presented-biassing plausibility), and whether the ratio of presented-to-not- 
presented had revertod to 50:50. Medians of correct response times in each condition 
for each subject were computed. The times presented here represent the mean of 
these medians. If a subject had no correct response times in a condition, the cell was 
estimated by taking the grand mean and subtracting from it both the effect size for that 
condition and the effect size for that subject.. Less than .003 of the cells needed to be 
estimated in that fashion. Figure 2 presents the difference between moderately and 
highly plausible response times as a function of biassing condition, collapsed over the 
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stated/not-stated factor. It is plotted for stories 1-6. where bias existed, and stories 7-10, 
where the ratio reverted to 50:50. Figure 3 presents the difference between stated and 
not-stated, collapsed over K .«usibility. 

Insert Table 2 about here 



There are a number of interesting patterns worth noting. Differences were plotted 
rather than the raw data because this way, it is easier to see how dramatic the effects 
are. Figure 2 picts tl:e extent to which there was an effect due to plausibility of the 
question (moderately plausible RT minus highly plausible). The effect due to plausibility 
of the question was much greater for subjects biased to use the plausibility strategy. 
This was true for both questions that had been presented in the story and for those that 
had not. The results shown in Figure 3 are in stark contrast. Figure 3 plots the extent 
to which there was an effect due to whether the question had beon presented in the 
story (not-stated minus stated). In this case, there was a large effect for subjects 
biased to use the direct retrieval strategy-just the opposite of Figura 2 which showed 
the plausibility effect. These different trends for the two groups are exactly what one 
would expect if the ratio of questions previously presented to not previously presented in 
the story actually did bias subjects' preference for a particular strategy. 

An ANOVA was performed on the data from the first six stories to determine 
whether the contrasts mentioned above were significant. For brevity, the reporting of 
standard results, e.g.. main effects due to whether the probe was stated or not. or the 
plausibility of the probe, will be omitted. Al! expected replications were obtained. There 
was a significant interaction on RT between bias (whether 80% of the probes were 
stated or only 20% were stated) and whether or not the probe was stated in the story. 
^1.57)- 16.1, p< 01. This interaction represents the finding that the difference between 
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stated and not-stated items was greater in the condition where the ratio of items biased 
subjects to adopt the direct-retrieval strategy. 3 

The interaction of plausibility with bias in strategy-selection also significantly 
affected RT, /=<1 ,57) » 6.2, p<.05 for the first six stories, such that the differences 
between highly and moderately plausible statements was larger for subjects who were 
biased to use the plausibility strategy. (There was no interaction for the last four 
stories.) 

Insert Figures 2 and 3 about here 



There were also some interesting results with respect to how subjects readjusted to 
a 50/50 split and how easily they altered their strategies. For example, there was a 
significant interaction of stated x bias x "half* (first 6 stories vs. last 4), fl[1,57) -11.8, 
P<.01. That statistic represents the fact that when the bias manipulation stopped, the 
stated/not-stated difference decreased for subjects originally biased to use direct retrieval 
and increased for subjects originally biased to use plausibility. The effect became 
equivalent for the two groups. Anothp* triple interaction which illustrates the same idea 
was the interaction of probe plausibility x bias x "half-, /=(1 f 57)-8.1, p<.0L Originally 
the plausibility effects were much bigger for subjects biased to use the plausibility 
strategy, but they got smaller when the bias disappeared. Conversely, the small 
plausibility effect for subjects biased to use direct retrieval Increased when the bias 
disappeared. Although the plausibility effect appears to have reversed itself when the 
bias was removed, the plausibility x bias interaction was not significant for the last four 
stories, P>.10. 
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DISCUSSION 

In this experiment, the official task, judging plausibility, was identical for both groups of 
subjects, so all differences in performance were due to the effectiveness of a particular 
strategy. These results strongly suggest that people are sensitive to the parameters of 
the situation in which they find themselves and can rapidly alter the strategy they 
employ. In past experiments, subjects were shown to adjust their strategy as the delay 
increased, perhaps because they knew thit the retrieval strategy would be less effective 
at a delay. Past studies also showed that preference for a strategy depends on what 
subjects are actually asked to do, viz., make recognition judgments or make plausibility 
judgments. The next study examines whether that result is due to official demands, per 
se, or only to subjects' sensitivity to the probability of success with a strategy in a given 
task. 

Experiment 2: WHAT STRATEGY IS PREFERRED WHEN BOTH 
STRATEGIES ALWAYS WORK? 

It is conceivable that strategy selection would not be affected by official task 
instructions if either strategy worked equally well for the required task. In past studies 
(e.g., Reder, 1982), where subjects were asked to make recognition judgments, they 
were more inclined to use the direct-retrieval strategy. That greater tendency to select 
the direct retrieval strategy may have occurred because they did not want to make all 
the errors that would result from adopting the plausibility strategy, and not because of 
the official task demands. For this reason, i conducted a study where either strategy 
could apply equally well, and looked to see the effects of official task demands and 
delay on strategy selection. 

In this study, all plausible statements were presented in the story and no 
implausibles were presented. Therefore, subjects could use either strategy and always 
be correct (assuming that they did not forgot the statement or make an erroneous 
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plausibility judgment). Nonetheless. I expected subjects to show less use of the 
plausibility strategy when Instructed to make recognition judgments. 

METHOD 

Design and Materials. The general design and materials were similar to Experiment 
1 with several important differences. Half of the subjects were randomly assigned to 
make recognition judgments and the other half to make plausibility judgments. Instead 
of varying tMe proportion of plausible str nents Included in the story, all plausible 
statements to be judged about a story hao been included as part of the story. Half of 
the subjects assigned to each task were tested after each story and tne other half of 
each task were tested after reading all ten stories. 

The stories and test diatetnents were the same as in Experiment 1 except that all 
plausible test items were presented in the story. None of the implausible statements 
were contradictions. In this way, subjects asked to make recognition judgments could 
use the plausibility strategy without making errors and subjects asked to make plausibility 
judgments could use the direct-retrieval strategy without making errors. 

Procedure. The experiment was quite similar to the procedure of Experiment 1, 
with only a few relevant changes. Depending on condition, subjects were either told 
that they would be questioned after reading each story or after reading all ten stories. 
Each story's test statements were preceded bv the story's title in the Delay Condition. 
Subjects assigned to the Recognition Condition were asked to decide whether or not 
each statement was exactly the same as one that they had read in the story, and not 
to respond affirmatively because a statement seemed true, if it had not actually been 
read. 
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Subjects. Sixty-two subjects were recruited from the department's summer-time 
subject pool and randomly assigned to conditions. In the Immediate Condition, there 
were 17 subjects asked to make recognition judgments and 14 asked to make plausibility 
judgments. At the longer delay (approximately 20 minutes, after reading ail ten stories), 
there were 16 assigned to the recognition t aS k and 15 to the plausibility task. Subjects 
received $4.50 for participating both . this 30-40 minute experiment and in one other 
that followed. 

RESULTS AND DISCUSSION 

Table 3 presents the data from Experiment 2, organized by delay, task and plausibility of 
the statements. These data are the means of subjects' correct median response times 
and the proportion of correct trials per condition. Analyses of variance were performed 
on the median correct response imes and accuracy data using the same factors 
mentioned above. Analyses were cone using different contrasts of plausibility: plauiible 
vs. implausible, and highly plausible vs. moderately plausible. 

Insert Table 3 about here 

Note that at the short delay interval, there was a 115 msec, advantage for the 
highly plausible statements over the moderately plausible statements in the plausibility 
task. This difference grew to a 266 msec, advantage at the longer delay, an increase 
of 151 msec. When subjects were asked to make recognition judgments, initially they 
were actually slightly slower (less than 25 msec.) for highly piamible statements than for 
moderately plausible statements, but here, too, the tendency to adopt the plausibility 
strategy increased: At the delayed test, the advantage of the highly plausible grew to 90 
msec, an increase of nearly 115 msec. The contrast using only plausible statements 
showed a marginally significant growth in the plausibility effect for both task types 
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/=*1.58)«2.8, P<.10. 
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The ANOVA that contrasted highly with moderately plausible statements produced a 
significant interaction of plausibility and task on response times, F{ 1,58)=* 4.0, p<.05, 
such that the difference between highly and moderately plausible statements was bigger 
fcr subjects actually asked to make plausibility judgments. The comparison of plausible 
statements with implausible statements also interacted significantly with task instructions 
for both accuracy and response times, F(1,58)-16.9 and 15.8, respectively, p< 01: 
Apparently, it is easier to say that an implausible statement was not presented (both in 
terms of RTs and accuracy) than to judge it as implausible. 

These data make clear that strategy-selection preference is affected by official task 
i equipments even wher the strategy-selection has no impact on performance. A 
different way of putting it is that subjects do not just follow instructions because they do 
not want to make errors. Still, the delay between study and test also seemed to 
influence strategy-selection. The present experiment can not tell us, however, whether 
the shift in strategy preference with delay is due to a conscious decision, i.e., being 
aware of the delay, or whether it is due to an impression that the information is now 
less available. 4 

Experiment 3: CAN PEOPLE SWITCH STRATEGIES FROM QUESTION TO 
QUESTION BASED ON ADVICE PRECEDING EACH ONE? 

The previous udies examined the differences due to instructions, e.g., asking 
su*~;ects to make recognition or plausibility judgments. These instructions were given 
only once, prior to reading the stories. Subjects could develop ? "frame of rind" and 
could keep the same bias for the duration of the experiment. Experiment 1 showed that 
subjects can and do shift their bias during an experiment, but the shift snown in that 
experiment was not from question to question. It is unclear whether subjects can 
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consciously alter strategy selection from question to question, at a moments notice. The 
ability to rapidly switch strategies might be fairly unconscious ana not something a 
person can self-instruct in a matter of seconds. 

in this experiment all subjects were asked to make plausibility judgments. 
However, before each question appeared on the computer screen, the subject was given 
"advice* by the computer as to which strategy was likely to be easier to use to answer 
the next question. For example, if the subject was to judge the plausibility of a 
statement that had been (recently) presented, the computer might advise that the subject 
"search memory* to try to find the relevant statement. If the statement had not been 
presented, the advice might be to try to "infer* the answer rather than searching for it 
directly. To make the advice useful, and in order to see whether the advice was naving 
any effect, the advice was appropriate the majority of the time, but not always. On 
80% of the trials, the advice was appropriate, and on 20% of the trials, the advice was 
inappropriate. 

METHOD 

Design and Materials. The design consisted of the factors of probe-plausibility 
(highly vs. moderately vs. not plausible), probe presentation in story (stated vs. not- 
stated), and advised strategy (inference vs. direct-retrieval of statement), Whether the 
advised strategy was appropriate or inappropriate depended on the recommendation and 
on whether the statement had been presented in the story. 

The stories were the same ten used in Experiments 1 and 2. The questions were 
also the same as those in Experiment 1 (three of the six implausibles per story 
contradicted a presented statement). Contradictory statements were considered 
presented, not-plausible statements. 

24 



Reder 22 

Before each trial subjects received advice, half of the time to search for a specific 
fact, and half to try to judge whether the statement was plausible. Half of the plausible 
statements were presented in the story for ail subjects and no implausible or 
contradictories were presented. It was still possible to design the program to have the 
advice be correct exactly 80% of the time. As in all studies, assignment of questions to 
condition was randomly determined, as was order of presentation of stories, with one 
exception: for the first story pair, only correct advice was given so that subjects would 
more rapidly learn to attend to the advice. 

Procedure. The procedure was very similar to Experiments 1 and 2. Questions 
were asked about stories after every two stories. The first of a pair was questioned, 
then the second. The relevant part of the instructions said: 

"...Before seeing each statement, the computer screen will advise you to use a 
particular strategy to judge the plausibility of the statement This advice is based on 
whether the statement or its contradiction was actually presented in the story. You will be 
told either Iry to retrieve a specific fact to use in judgment' or Try to infer the answer/ 
Most of the time the advice is going to be helpful That is, when you are advised to 
retrieve a fact either that fact or its contradiction was stated in the story. Similarly, when 
advised to infer, most of the time neither the fact nor its contradiction was stated. 
However, occasionally the advice will be wrong, such that the opposite advice would have 
been correct. Should the advice be wrong, still try to answer the question correctly." 

The instructions continued with concrete examples, and advice about maintaining 
speed and accuracy and keeping index fingers on the response keys during the testing 
phase. The advice that preceded each question was displayed for a minimum of 0.5 
seconds. Then hitting the space bar allowed the question to be presented. 
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Subjects. Eighteen undergraduates enrolled in psychology classes at C-MU 

v participated for one credit toward a course requirement. One subject's data were lost 

due to technical difficulties with the computer. 

RESULTS AND DISCUSSION 

Means of median correct response times and proportion of correct responses are 
displayed in Table 4 as a function of advice suggested, plausibility of the probe and 
whether the probe had been stated in the story. The suitability or appropriateness of 
the advice is indicated by a (+) or (•). For example, for the not-stated items, the 
inference advice Is appropriate and the direct retrieval advice inappropriate. Thus, the 
line above the not-stated items says *lnf.(+) Dir. ret.(-)\ The data from the first story 
pair are not included because no incorrect advice was given while attempting to get 
subjects to attend to the advice. 

Insert Table 4 about here 



An ANOVA was performed using the factors of probe plausibility, probe presentation 
and strategy advised. The first thing to note is that subjects were significantly faster if 
the advice was correct, flt1,16)-5.59, p<.05 (interaction of advice and probe 
presentation). That is, subjects were faster with the direct-retrieval advice when the 
probes were stated in the story and were faster with the inference advice when the 
probes were not stated in the story. The RTs for not-stated probes when direct-retrieval 
was advised were significantly slower than all others, 432)* 3.82, p<.01, because this 
was the only condition where the advised strategy would not work. (The advice to infer 
when the probe was stated will work; however, it is non-optimal since at such a short 
delay, direct-retrieval is a faster strategy.) For highly plausible statements, the suitability 
of the advice had little effect. This suggests that many highly plausible statements were 



26 



Reder 24 

inferred by the subject when they had not been explicitly stated. 



A second pattern worth mentioning is the difference in RT between the moderately 
and highly plausible statements as a function of strategy advised and whether or not the 
advice would work. The interaction of stated x plausibility (alternatively called 

CORRECTNESS OF ADVICE X TYPE OF ADVICE X PLAUSIBILITY) was Significant, ^2,32) -6.54, 

p<.01. For stated probes, when searching for specific facts was adv^sd, there was 
less than a 50 msec. difference between the moderately and highly plausible 
statements. For these same probes, the difference was over 300 msec, if inference was 
advised. For not-^tted probes, there was a large difference due to plausibility 
regardless of strategy advised since the inference strategy had to be executed ultimately. 
When direct retrieval was advised, the mod itely plausible were very stow, since two 
strategies had to be tried; however, the highly plausible statements did not show this 
pattern. This again suggests that the highly plausible inferences were found in memory 
during the direct retrieval search, (i.e., v.* » Inferred during reading) so that the second 
strategy did no f have to be evoked 

Suitability of the advice did not have a reliable effect on accuracy, but accuracy 
was affected by whether the probe had been stated in the story, fl(1,18)«7.20, p<.05. 5 

Summary and Implications 

Experiments 1, 2, and 3 showed the extent to which strategy-selection can be 
influenced by factors extrinsic to the test question. Since the ratio of presented to not- 
presented statements affects the probability of success of .the direct-retrieval strategy, 
subjects in Experiment 1 adjusted their use of strategies accordingly. Experiment 1 also 
1 howed that people quickly adapt to a change in this ratio. To balance Experiment 1 , 
Experiment 2 showed that subjects' strategy selection is affected by the official task 
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instructions even when either strategy (direct-retrieval or plausibility judgments) will 
produce the correct response. There were larger plausibility effects in conditions where 
subjects were actually askeo to make plausibility judgments. The plausibility effects were 
larger at longer delays for subjects in both task conditions, confirming that other 
variables also influence choice. 

The resuts from Experiment 3 give support to the idea that people can rapidly 
alter the strategy they use to answer a question. Subjects could modify which strategy 
they selected from question to question on the basis of an external cue such as 'try to 
find the fact in memory" or "try to infer whether this statement is plausible". 
Plausibility effects were small if the direct retrieval advice was given and would work. 
The difference in RT between questions that had been stated and those that had not 
been stated was small if the advice was to use plausibility. 

Experiment 3 clearly indicated that performance suffers when the wrong strategy is 
selected. Considering the results of this experiment, those of Experiments 1 and 2 t and 
all the evidence reviewed earlier, it seems clear that any question-answering model 
requires a preliminary itrategy selection phase. Given the existence of a strategy 
selection stage, it is interesting to ask whether factors besides situational or contextual 
variables play a role in this strategy selection. The distinction here is between variables 
extrinsic to the test question and variables intrinsic to the question. The next section 
addresses this issue. 

Tho Role of Stimulus Evaluation 
in Strategy Selectlo' 

There is reason to believe that, in addition to situational or extrinsic variables, 
variables intrinsic to the test question also play a role in strategy selection, Reder 
(1982). Reder and Wible (1984) and Experiment ;> all show that subjects shifted their 
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strategy preference with delay. The explanation for tin* strategy shift was that at longer 
delays information is less accessible, making direct iatrieval less desirable. It is possible 
that the decision to shift strategies was based on the subject's knowledge of the delay 
between study and test; however, I believe that subjects shifted strategy by evaluating 
the familiarity of the test questions. This is because in most (non-experimental) 
situations, people do not know in advance when the queried information was learned and 
consequently need to have some mechanism for assessing familiarity and choosing a 
strategy. 

The position that sentential or intrinsic variables should affect strategy selection is 
consistent with other views of question answering. A number of models have suggested 
that subjects make memory judgments by a two-stage process of (1) an initial memory 
evaluation, followed by (2) an optional, second process that more carefully inspects 
memory. For example, Atkinson and Juola (1974), in their analysis of word recognition, 
propose that subjects initially assess the "familiarity" of an item. If the item seems 
highly familiar (e.g., - sure I've seen this recently"), they recognize the item; if it is 
of low familiarity they reject it. For intermediate levels of familiarity, subjects have to 
engage in a search A memory. 

Smith, Shoben, and Rips (1974) proposed a similar idea for category judgments 
e.g., "a chicken is a bird," "a canary is a bird"). They proposed that subjects 
evaluated the raw similarity (or relatedness) owtween subject and predicate. Again, if 
similarity was high, the statement was accepted; if low, it was rejected. For intermediate 
values, a careful inspection of the defining features was required, ignoring overlaps on 
"characteristic features". 

Giucksberg and McCloskey (1981) also have evidence consistent with a preliminary 
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stage that precedes careful inspection. In this case, the data suggested that subjects 
make a quick exit if the preliminary inspection fails to find a connection among the 
concepts. Subjects were much faster to respond "It is unknown whether John 
possessed a gun* if they had rot studied anything about John and guns than if they 
had studied that exact proposition "it is unknown whether../. Presumably a first stage 
determines whether there are connections between John and gun. If none, subjects can 
say * unknown " rapidly. Otherwise a second stage carefully inspects the nature of the 
connection. 

Some of the models mat postulate a preliminary-evaluation stage assume that the 
outcome of the evaluation not only determines whether or not to go on to do further 
processing, but also how much time should be devoted to this second stage. For 
example, Lachman, Lachmen & Thronesberry (1979) postulate metamemorial processes 
that regulate how long search continues lor a specific piece of information. 

Metamemory is accurate If it returns correct information about the contents in 
store. It is efficient if it appropriately controls search durations so that more 
time is allocated to seeking information actually present, less to information 
actually absent, (pp.543) 

Nelson, Getter and Narens (1984) have found a positive relationship between the feeling 
of knowing and ihe amount of time elapsing before a memory search was terminated 
during recall. 

Despite all the empirical support for the models described above, it is still 
uncertain whether initial evaluation of a question can influence strategy selection. This is 
because all of these models assume that if processing goes beyond a preliminary 
evaluation, there is only one possible strategy to be employed. That strategy is 
assumed to be a careful inspection of memory, in contrast to the more sloppy process 
used for the preliminary evaluation (e.g., Atkinson & Juola 1974. Glucksberg & 
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McCI03key, 1981; Lachman, Lachman & Thronesberry, 1979; Smith, Shoben & Rips, 
1374). if a second strategy such as plausibility or Inference Is considered at all, it is 
assumed to always follow the direct-retrieval strategy (e.g., Lachman & Lachman, 1980). 
The initial evaluation proposed In models such as Atkinson and Juoia is, In reality, the 
first of a set of strategies to execute, and no selection Is done at ail. The goal of the 
following sections Is to give support to the hypothesis that initial evaluation of a question 
can affect strategy selection. 

Can We Rapidly Evaluate Our Ability to Answer Questions? 

Past work on 'feeling of knowing* has demonstrated that when people are unable 
to answer a question, they are nonetheless able to accurately assess the probability that 
they wHI be able to recognize the answer (e.g., Nelson et al.; Nelson & Narens, 1980; 
Read & Bruce, 1982). Conceivably, the same type of mechanism that allows for these 
*feellng-of-knowing" judgments operates automatically even when people can answer tfie 
question. Given that question-answering requires a strategy-selection stage prior to 
strategy execution, it seems reasonable that the mechanisms Involved In "feeling of 
knowing" could be Involved In assessing which strategy is preferable. For this to be a 
viable assumption, the "feeling of knowing" or Initial evaluation would have to take 
substantially less time than the total time it takes to answer a question. 

To see whether our "feeling of knowing" process could be Involved in question- 
answering, I.e., precede a strategy execution phase, Experiment 4 asks whether people 
can judge their ability to answer a question faster than they can actually answer It. If 
we do have a "feellng-of-knowing" process that enables us to select question-answering 
strategies, then this assessment should operate faster as an explicit judgment than the 
task of actually retrieving the answer to the question even though we have llttie practice 
at overtly assessing our "feeling-of-knowlng". 
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Experiment 4: GAME-SHOW: CAN PEOPLE ESTIMATE ANSWERABILITY 
FASTER THAN THEY CAN ANSWER? 

To tap Into this immediate, 'feeting-of-knowing* process and compare It to question- 
answering, a 'game show* format was developed. Subjects see a question and rapidly 
decide whether they can answer It. If they respond 'no* or wait too long after the 
question has appeared on the screen, they lose the opportunity to answer the question. 

METHOD 

Overview. Subjects were asked to read questions pertaining to world knowledge 
and then, depending on condition, either answer the question aloud or say whether or 
not they thought they would be able to come up with the answer. In both conditions, 
subjects were encouraged to respond as rapidly as they could. Those in the Estimate 
Condition who said 'yes* were then asked to come up with the answer to the question. 

Subjects. Thirty-one Carnegie-Mellon undergraduates participated In the experiment 
to partially fulfill a course requirement. Fifteen were randomly assigned to the Answer 
Condition and the other 16 to the Estimate Condition. 

Materials. Questions were constructed for four levels of difficulty: 20 extremely 
difficult or virtually Imposslble-to-answer questions (e.g., What Is Menachim Begin 's 
favorite dessert?); 20 difficult (e.g., Who was the first man to climb Mount Everest?); 21 
questions of moderate difficulty (e.g., Where did the Qreek gods live?); and 28 easy 
questions (e.g., How many tentacles does an octopus have?). The questions were aU of 
approximately the same length; however, the primary concern was to compare 
performance between groups that both saw the same materials, so there was no attempt 
to ensure uniformity among sentence types. Assignment of questions to difficulty level 
was done on an intuitive basis, I.e., no norms were taken. 
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Procedure. The experiment was conducted on a terminal attached to a PDP/11 
using the RSX-11 operating system. Attached to the terminal were a button-box, and a 
voice-key with a microphone. Subjects were Instructed about the task after their random 
assignment to one of the two groups. Those In the Answer Condition were told to read 
each question on the computer screen and to say the answer Into the microphone "as 
quicKly as possible, without sacrificing accuracy". The answer triggered the voice-key 
attached to the computer and the response latency was recorded. After responding, 
subjects typed their verbal response Into the computer. Subjects were asked to say 
"don't know" as quickly as possible Into the microphone when they did at know the 
answer. 

Subjects in the Estimate Condition were instructed to say "yes" or "no" into the 
microphone as quickly as possible after seeing the question. For "yes" responses, the 
subjects were then asked to type in the answer to the question, 

To motivate fast responding, response times appeared on the computer screen for 
a few seconds after each trial. Subjects controlled the rate at which they saw the 
questions by pushing a start-button to initiate the next trial. Subjects were also alerted 
to the problem of spurious triggerings of the voice-key due to random noises such as 
coughs, and to the problem of responses spoken too softly to activate the voice-key. 
On each trial, there was the opportunity to indicate that the response time was 
inaccurate due to early or late triggering of the voice-key. The experimenter was 
present at all times to insure that subjects typed In the verbal response they gavtf and 
to nullify trials where the voice-key did not accurately record RT. The presentation order 
of the questions was randomly determined for each subject. 
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RESULTS 

A scoring program scored obviously correct responses. All other answers to a given 
question were presented to a rater to score. Because the computer presented all 
answers for a given question togethc, the rater had no idea whether a particular answer 
came from a subject in the Estimate Condition or the Answer Condition. 

Median response times were used to estimate each subject's performance in each 
condition. For each type of question, Table 5 gives the mean time to give the answer 
(Answer Condition) or say "yes* (Estimate Condition), the proportion of answers 
attempted (i.e., questions for which subjects did not say "don't know" in the Answer 
Condition), the percentage of questions answered correctly and the "accuracy" of 
question-answering. Accuracy, viz. the ratio of proportion correctly answered over the 
proportion attempted, refers to how good the subject was at estimating what he or she 
knew. (For impossible questions, "can't say was considered correct.) Table S also 
gives the mean RTs for questions correctly answered and incorrectly answered, and for 
questions for which subjects said "can't say." These latter numbers are not partitioned 
by difficulty of question, to save space. The Analyses of variance used a 2 (task) x 3 
(question-difficulty-without impossibles) design tar positive response times, regardless of 
whether or not they were correct. 6 

Insert Table 5 about here 



The most important result to note is that subjects asked to make estimates are 
over 25% faster than those asked to actually generate an answer. This difference is, of 
course, significant, ^[1, 29) « 20.0, p<.001. This is true whether one looks only at 
positive estimates or both positive and negative estimates, so it can not be due to 
subjects merely saying a rapid "no" to anything they do not know very well in the 

ERIC °* 



Reder 32 

Estimate Condition. On the other hand, it is true that subjects in the Estimate Condition 
attempt to answer fewer questions than do subjects in the Answer Condition; however, 
this difference is not reliable [F<1.0] and does not map onto fewer correct judgments in 
the Estimate* Condition. The groups do not differ in terms of the number of questions 
answered correctly, [F<1.0]. Since subjects in the Estimate Condition attempt fewer 
questions, they have fewer erroneous attempts, making them significantly more accurate, 
where accuracy is defined as proportion correct of those attempted, 29) - 16.4, 
p<.00i. Clearly then, the speed advantage for the Estimate Condition can not be due 
to a speed/accuracy trade-off, where subjects are merely stopping the same process too 
soon. 

There were significant effects due to difficulty of question, In terms of number of 
questions attempted, number answered correctly and accuracy of attempts, 
fl[2,58)» 117.2, 205.4 and 24.7, respectively, p<.00l. Not surprisingly, subjects were 
less inclined to answer and less accurate with the more difficult questions. There was 
also an interaction of question difficulty with task, such that the accuracy advantage of 
the Estimate Group over the Answer Group was greater for more difficult question types, 
R2,58)*4.1, p<.05. Surprisingly, there was no difference in response times due to 
question difficulty. 7 On the other hand, in the next experiment, there is an effect of 
question difficulty on response times. 

There is one other noteworthy result. The data were analyzed as a function of 
practice, partitioning the data into the first 25% of the experiment and the last 75%. 
Since the estimation task is not 3ne that most subjects are used to performing, it 
seemed likely that any advantage of that condition would take time to develop. The 
advantage of the Estimate Condition was 480 msec, during the first 25% of the 
experiment, but grew to 986 msec in the last 75%. 
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DISCUSSION 

Although the data support the hypothesis that there is a mechanism that allows people 
to evaluate how much they know before they can actually answer a question, the results 
are open to other interpretations: In the Answer Condition subjects have to articulate a 
longer response than subjects in the Estimate Condition. There is evidence that 
subjects are faster to initiate an articulation of a short word (e.g., "yes") than a long 
word (e.g., 'baseball*) (e.g., Fowler, 1980; Klapp, 1974; Sternberg & Monsell, 1981). 
Those effects, however, tend to be on the order of 10 to 15 msec., while the effects 
reported here are on the order of 800 msec. 

Given the sizeable advantage of the Estimate Condition, both In terms of response 
time and accuracy of estimation, it seemed worth demonstrating that the effect was not 
due to something as uninteresting as an advantage for getting to give the same 'yes" 
and "no" responses on multiple trials. Therefore, in order to control for any advantage 
due to binary responding, per se, Experiment 5 required subjects in both groups to 
make binary decisions prior to giving the answer. 

Experiment S: BINARY RESPONDING FOR ESTIMATE VS. ANSWER. 

This experiment was similar to Experiment 4 with several notable exceptions: 
Subjects in both groups first pushed one of two buttons. If they pushed the button 
indicating that they had the answer ready to give (Answer Condition) or thought they 
could answer the question (Estimate Condition), then they went on and said the answer 
into a microphone that recorded their latency to generate the answer. Two response 
times were collected for those questions that had an affirmative response, namely time 
for the positive response and time to articulate the answer. 

Getting subjects in the two conditions to treat the two tasks differently was not 
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trivial since the logical structure of the tasks was identical. The Answer Group was 
penalized if they could not come up with the answer shortly after pressing the button. 
Further, in the Answer Condition, the question was removed from the screen after 
pressing the button s'nce subjects were already supposed to have the answer in mind; 
however, in the Estimate Condition, the question remained on the screen until the 
subject said the answer into the microphone. They were given unlimited time to give 
the answer after estimating that it was answerable. 

METHOD 

Materials and Design. Both the questions used and the design of the experiment 
were identical to Experiment 4. The c difference was in the collection of two 
latencies per trial, rather than one. 

Procedure. The experiment was conducted on an IBM personal computer, to which 
a button-box, a microphone and voice-key were connected. Subjects were told that the 
experiment was similar to a television game show; they would accumulate points for their 
answers, which were redeemable for cash at the end of the experiment. 

Subjects assigned to the Answer Condition were instructed to press the green 
(right hand) button as soon as they were sure they had the answer to a question and 
were ready to say it, but to press the red button (left hand) if they were sure they did 
not know the answer. They were told to indicate their response only after having 
searched through memory and either finding or failing to find the answer. Those in the. 
Estimate Condition were instructed to press the gr*en button as soon as they thought 
they probably would be able to find th* answer in memory, and to press the red button 
as soon as they thought they probably would not be able to find the answer. 
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Both groups were given points for fast responding to the first button press; 
however, in the Answer Group, subjects had to speak the answer into the microphone 
within one second of pressing the button or else they lost points. The points awarded 
for each answer depended on button-press response time. Additionally, for affirmative 
responses, points depended on accuracy of answer and whether or not the verbal 
response was begun within the one second time limit (In the Answer Condition). The 
scoring method was explained to both groups, and subjects were told how many points 
they had accumulated after each trial. Thoy also earned points for negative responses* 
Because it was easier to amass points in the Estimate Group, the conversion of points 
into money was different for the two groups. 6 

Following an affirmative response, the computer screen prompted the sutysct to 
say the answer into the microphone. If a subject in the Answer Condition failed to give 
the answer within one second, the computer then prompted the subject to still attempt 
to give the answer (but implied that the response was late). After a verbal response 
was given, the correct answer was displayed on the screen and the experimenter scored 
the response. Questions were presented in random order. 

Subjects. Thirty-three undergraduates enrolled in their first psychology class 
participated to partially fulfill a course requirement. No subject had participated in a 
previous version of this experiment. Sixteen subjects were randomly assigned to the 
Estimate Condition and 17 to the Answer Condition. In addition to receiving course 
credit, they received nominal payment for performance in the task: 2.5 mils/point in the 
Estimate Group and 3.125 mHs/point in the Answer Group, which averaged about $.£0 in 
bonus payment. 
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RESULTS 

Table 6 presents the data in a format similar to Table 5. The estimation or attempt 
times for questions that were subsequently answered correctly (RT1) are given for each 
question type Instead of all showing positive response times. (The difference between 
the two measures is very slight.) The times are also given for the correct articulation of 
the answers, (RT2), and the sum of these two response times (RT1 + RT2). 

Insert Table 6 about here 



The ANOVAs used the same factors as Experiment 4, and used correct RTs for 
phase 2 and the *um of these two times. The percentage of questions attempted did 
not differ for the two Instructional groups, although It did differ as a function of the 
difficulty of the questions, A(2,62)-153, p<00l. Percent correctly answered and 
accuracy did not differ across the twa groups, but again, did differ with difficulty of 
question, R2,62)» 180.85 and 18.44, respectively, p<.00l. 

Of greater Interest, time to estimate that a question could be answered was 
significantly faster than time to Indicate an answer was "In mind", fl(l,3l)«4.78, p<.0S. 
Response times for the first phase also differed significantly as a function of question 
difficulty (unlike Experiment 4), fl(2,62)-4.24, p<.05, such that for easier questions, 
subjects were faster at being ready to give the answer or to estimate they could answer 
them. 

Time to generate the answer also differed significantly as a function of task 
instructions, R1,31)«9.18, p<.0l, but In the opposite direction. Subjects In the Answer 
Condition were faster than the Estimate Condition, as they should be, to give the 
answer. Question difficulty did not affect time to give the answer, nor was the 
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interaction with task significant. As the model would predict, the sum of the two 
response times did not differ significantly, F(1,31)«.02, for the two tasks, even though 
the time for each part differed significantly (but in opposite directions). The reason the 
model would predict that the sums would be roughly equivalent ic that the estimate- 
phase RT should be a subset of the answer task's first phase, and the processing that 
the answer task did during the first phase, namely finding the answer in memory, should 
be included in the RT for the second phase for the Estimate Group. 9 

DISCUSSION 

Experiment 5 replicates the findings of Experiment 4, that subjects can estimate that 
they can answer a question significantly faster (without sacrificing accuracy) than they 
can actually find the answer. That is, subjects are at least as accurate at estimating 
answerability as they are at attempting the answer. It is unlikely that this advantage Is 
due to something trivial such as difficulty in articulating the response since both groups 
made a binary decision followed by the complete answer. Taken together, these data 
support the proposal that we have the capability to assess our memories before we do 
a careful search of memory. 

The research reviewed at the beginning of the paper and the first set of 
experiments argued strongly that strategy selection is part of question-answering. These 
last two experiments have shown that a sentence can receive an initial evaluation quickly 
enough to make it reasonable that intrinsic variables can also influence that selection. 
Beiow I outline the kinds of mechanisms and cognitive factors that might be involved in 
the initial evaluation process. 

Mechanisms for Evaluating "FeeTmg-of-Knowing" 

The processes involved in the initia! evaluation of a question might include (1) 
determining how recently the terms in the question have been encountered, and (2) 
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mea3uring the extent of know.^ge stored in memory relevant to the question. These 
processes are assumed to operate In the context of a semantic network In which 
concepts in memory can become "active 9 from the terms in the test question. Recency 
(to be referred to as familiarity and extent of related information are measured in terms 

of activation. 

Familiarity 

Although there tw o not been models concerned with how feeling cf knciwin < night 
affect strategy selection, there are theories concerned with how famil'arity cete<mMed. 
For the most part, these theories (e.g., Jacoby & Dallas, 1981; Hasher & Zacks, 1973; 
Hlntzman, Noztwa, & Irmscher, 1982; Mandler, 1980; Zacks, Hasher & Sanft.1982), have 
postulated two separate mechanisms for judglr»g familiarity. Although these idei havt 
been applied mostly to tasks concernfd wi*h recognition or frequency judgments, they 
can be incorporated into a framework which ?s concerned with more complicated types 
of memory queries. 

Mandler (1980) has argued for u.o separate types of recognition processes, one 
that measures familiarity or occurrence information, and the second that is a much 
slower, more careful retrieval mechanism cr search, He suggests that the first type of 
process is affected by shifts in modality (e.g., auditory during study, but written at test) 
and that this familiarity/occurrence information decays taster than the 
propositional/symbolic information; however, the familiarity process ilso can execute faster 
during recognition. This proposal of an automatic process that recognizes familiar traces 
is similar to the proposal of Hw.^er and Zacks (1979) that there !* an automatic 
mechanism that keeps track of frequency information. Hasher and Zacks, like Mandler, 
also postulate a more "controlled" (non-autonatfc) memory mechanism. They found that 
many variables which affect recall performance dc not affect the frequency ;jd; <ent 
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parformanca, i.e., tha latter process doas not dagrada with increasad procasalng loads, 
age, ate. Hlntzman, Nozawa, and Irmschar (1982) also hava data consistent with thasa 
thaorlaa. Jacoby and Dallaa (1981) maka similar proposals as wall; specifically, thay 
postulate two memory-types, an autobiographical form of memory and a "less aware" 
form of "perceptual learning/ They note that levels of processing (See Cralk & Lockhart, 
1972) affect recognition memory but not perceptual recognition. Perceptual recognition 
might be thought of &s a physical match, and in Ilka the mechanlam that keepa track of 
frequency Information for Hasher and Zacks and Hlntzman at al., and is also Ilka the 
fast, recognition-memory mechanism of Mandler. 

In the preaent framework, determining the recency of exposure to a concept In the 
question Is measured by how active it la relative to its base-activation level. 10 So, for 
example, If a story mentioned certain words often and some of those words were 
contained In a test statement, the feellng-of-knowing mechanlam would probably register 
high familiarity. 

Related Knowledge in Memory 

In addition to the faat proceas of determining "raw familiarity", this initial evaluation also 
measures the "relatedneea* of the concepts !n the question through the Interconnections 
in memory. The proposal that relatedness affects decision times Is also not new. For 
example, Rips, ShoDen and Smith (1973) postulate differences In feature overlap to 
explain the faster categorization times for dominant inatancea. Some research of my 
own (Reder & Anderson, 1980; Reder & Ross, 1983) alao suggests that subjects can 
use a relatedness judgment to by-pass retrieval of a specific fact from memory. 
"Relatedness" (for Initial evaluation) la defined as the degree to which words in a 
question crm activation to Intersect In memory. The more intersections detected In 
memory as a result of a query, the mors potentially relevant information is available for 
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question-answering. 
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These two feeling-of-knowing processes, familiarity-detection and intersection- 
detection, go on in parallel. It is a useful heuristic to assume that when familiarity Is 
high, the statement was seen recently and should be relatively accessible. Direct retrieval 
is a faster and easier strategy than judging plausibility when the specific teci that must 
be fc ,d is relatively accessible. Therefore, when familiarity is high, direct retrieval 
should be the preferred strategy. 

When the process that detects intersection of activation determines that there is a 
lot or a moderate amount of potentially relevant information, there is a bias to use the 
plausibility strategy. When both biases exist, than the bias to use plausibility is 
supe ,sdeo by the bias to use direct-retrieval. This is because plausibility always has a 
longer computation stage, and if the memory search is relatively easy, plausibility does 
not have the compensating search time advantage to make it the preferred strategy 
(Reder, 1982). Questions that produce little activation are immediately "recognized* as 
unanswerable (e.g., what is the rate of mitosis in paramecia?) u 

This next experiment tests a few of the Implications of ihis initial evaluation based 
on intrinsic features. It is designed to see whether our "feeling-of-knowing" is really 
based on things like familiarity (recency of exposure) and number of intersections in 
memc7 even when it turns out that these features have no predictive validity as to the 
subject's knowing the answer. 

Experiment 6: CAN OUR ESTIMATION PROCESS BE SUBVERTED BY 
SPURIOUS FAMILIARITY? 

Like Experiments 4 and 5, in this experiment half of the subjects were asked to 
estimate the answerability of questions, and the other half to answer them directly. 
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Before subjects began the question-answering or estimation phase, they were asked to 
rate the frequency of occurence of some terms or pairs of terms. These terms were 
selected from a random third of the questions to be estimated and/or answered. Rating 
terms that would be ieen as part of a question was the 'priming* manipulation. Half 
of the subjects were asked to rate pairs of terms on conjoint frequency (where both 
terms were taken from the same question); the other half rated terms in isolation. For 
both types of rating groups, the same terms from a question were rated if a question 
was to be primed. For this reason, subjects who rated pairs had exactly half as many 
rating trials. 

In the past, subjects asked to estimate whether they could answer a question were 
more accurate than subjects actually asked to artiwer them. The prediction is that *y 
priming words from a question, subjects would be 'thrown off* in terms of using the 
mechanisms they normally use to judge answerability. The Estimate Group should 
estimate that they can answer more primed questions than unprimed, at least for those 
questions that are difficult, l.e that they would not otherwise judge as answerable. 
Question difficulty was varied systematically so that this prediction could be tested. The 
Answer Group was not expected to give more answers to the primed questions; however, 
they were expected to take longer tr sa> thai they could not inswer a question (i.e., 
search longer for the answer before giving up) if It had been primed. Also of interest 
was whether the effect of priming differs depending on whether the terms were rated 
together or separately. 

METHOD 

Materials. Questions ware selected from a set that has been normed fo; 
answerability (Nelson & Narens, 1980). Three levels of difficulty were used: twenty-one 
questions from the most difficult of Nelson and Naren s question sit were selected. 
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(mean recall - 6.5%); 21 from mid-range (31.8%); and 21 from the easiest third 
(71.5%). For each question, two terms that seemed (a) least common, and (b) most 
"central* to the question, were selected to be the candidate priming words. For 
example, for the question "What term in golf refers to a score of one under par on a 
particular hole?", the priming terms were "golf" and "par". 12 Candidates were needed 
for all questions, since those selected for priming within each level of difficulty were 
randomly determined for each subject. In addition to these 83 questions, 15 easy 
practice questions were used. 

Design. There were two between-subject factors: task group (estimate vs. 
answer), and type of priming or rating task (rating individual terms vs. rating term-pairs). 
There were also two within-subject factors: question difficulty (easy vs. moderate vs. 
difficult), and whether the question was primed by the rating task or not (primed vs. 
unprimed). Half as many questions were primed as unprlmed because subjects might 
have become auspicious of the priming manipulation and attempted to alter their 
"feellng-of-knowlng" strategy If too many questions were primed. Each level of difficulty 
had seven primed and 14 unprimed questions. 

Procedure. Subjects were seated In front of an IBM PC and randomly assigned to 
one of four conditions. (They were unaware initially as to whether they would be making 
entlmates or directly answering questions.) Before the question-phase began, subjects 
were told to rate the terms that woulo be displayed on the screen. Subjects were 
asked to rate, on a five-point scale, how often a term was encountered during reading 
ot listening, or how often the pair of terms was encountered together. I.e., conjoint 
frequency. The order of presentation of the terms or pairs was randomly determined for 
each subject with two constraints: terms from the same sentence could not be 
presented sequentially in the "single" conditions, and practice items always were rated 
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first. 
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Following the rating task, subjects were instructed about the question-answering 
phase, which . was similar to Experiment 4. The Answer Group space their answers 
directly into th« microphone, and the Estimate Group said "yes" or "no" orally before 
giving an answer. 

Subjects. There were 76 subjects: 18 in the Estimate-single Group, 20 in the 
Estimate-pair Group and 19 in each Answer Condition. Forty-seven subjects were paid 
$4.50 for participating in this and one other experiment. The others were given course 
credit. The paid subjects were either students or staff from Carnegie-Mellon, while those 
receiving credit were students participating in their first psychology course. 

RESULTS 

Table 7 Is organized so that the data from subjects in the Estimate Groups . are 
presented on top the data from subjects in the Answer Groups are given in the 
lower panel. The data are given on separate rows for the three levels of question 
difficulty. Each row gives the proportion of questions attempted, the time to attempt an 
answer (say "yes" in the Estimate Condition), and the time to say "don't know" (or 
"no"), for both unprimed and primed questions. The data are collapsed over the type 
of rating task (pairs vs. singles), since the patterns ars very similar for the two priming 
conditions and that variable did not Interact with any other. 13 

Insert Table 7 about here 



Analyses of variance were performed for each of the measures listed in Table 3, 
using the factors listed there and rating condition. The median of a subject's times in 
a condition was used in the analyses. For purposes of the ANCVAs. missing cells were 



46 



Reder 44 

estimated by using the grand mean of each group plus the individual's subject effect, 
plus the relevant condition's effect. 14 



Effect of Priming on Proportion of Questions Attempted, The proportion of questions 
attempted varied as a function of difficulty and also, difficulty interacted with task 
(answer vs. estimate), R2, 144) -3.08 and 6.01 respectively, p<.01. For both tasks fewer 
hard questions were attempted, but the drop-off from easy to hard was more precipitous 
in the estimate task, (replicating past results). Of more interest, there was a significant 
interaction of task with question-difficulty and the priming variable, flftl44)»4.52, p<.01 
(one-tailed; P-.0125, two-tailed). In the estimate task, subjects estimate 6% fewer 
primed questions than unprimed questions in the easy task, 3% more primed questions 
of the moderately difficult, and 7% more of the hard ones. This represents a significant 
interaction of question difficulty with priming, R2 t 72)«3.8l f p<.0S for the estimate 
subjects. 

There was no systematic effect of priming for proportion of questions attempted in 
the answer task. The interaction of question difficulty with priming was' not significant in 
the answer task, p>.10. No interaction was expected there either, since people could 
not just use their "feeling of knowing" in order to answer. The effect of priming was 
not consistent across the two rating tasks in the answer task, while the pattern of 
priming effects for levels of difficulty was the same for both rating tasks in the estimate 
task. 

Effect of Priming on Time to Respond. The time to attempt an answer was, of 
course, longer for difficult questions, F{2, 144) « 1 7.54. p< 01 . There was also a 
significant interaction of difficulty with type of task. fl(2, 144) ■ 6.32, p<.0L The slow- 
down in RT with more difficult questions was greater in the Answer Condition because 
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subjects had to actually find the difficult answers, while in he Estimate Condition, a 
"feeling of knowing" may come almost as quickly for difficult questions. 

Priming had opposite effects on the two tasks: Subjects were over 100 msec. 
faster to estimate that they could answer questions if the questions had been primed. 
On the other hand, subjects were 200 msec, sfc ver to give an answer if the question 
had been primed. 15 Presumably the priming facilitated the "feeling of knowing* for both 
groups. For the estimate task, that was all that was needed and a decision could be 
made sooner. For the Answer Group, however, this "feeling of knowing" led them to 
look longer for answers than they would have otherwise. 

Effects of Priming on Times to Hot Attempt an Answer. The result that subjects 
were slowed down by priming In the answer task is also reflected in the times for those 
questions that were not afampted There is a marginally significant interaction of priming 
and task, fltl,72)«3.3, p<.10, reflecting the fact that subjects were 800 msec, slower 
to say they could not answer a question if it had been primed but were 35 msec, faster 
for primed questions if they only had to estimate that they could answer them. There 
were significant interactions of difficulty x task, F(2 % 144) « 7.0, p<.01, and difficulty x 
priming x task, F\2, 144) « 3.1, p<.05. The rating task gave subjects a false "feeling of 
knowing" in the answer task too. For the more difficult primed statements, they 
therefore tried longer to come up with an answer before realizing that they could not. 

DISCUSSION 

Experiment 8 supports the idea that recent exposure to concepts affects a person's 
"feeling of knowing." The results also support the idea that there is a separate process 
for initial evaluation since the manipulation of priming had complimentary effects for 
subjects asked to answer directly as compared to those asked to estimate whether or 
not they couid answer: Proportion attempted was affected by priming only in the 
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estimate task, since subjects in the Answer Condition had to come up with the answer; 
on the other hand, time to say 'don't know* was only affected by priming in the answer 
tasks. 

One result in the data to be explained is why subjects were /ess inclined to 
estimate that they knew an answer to a question for primed questions that were easy to 
answer. One explanation is that subjects were aware of the priming manipulation and 
its distorting influence and were trying to counteract it. If subjects were aware that they 
were more inclined to f>ositlvely estimate when the items seemed familiar from rating, 
and that their accuracy suffered as a result, they might try to counter-act the 
manipulation. If so, whenever they recognized that the terms in the probe had been 
previously rated, they might raise their criterion tor "feeling of knowing". 

The raising of the criterion would not have the same effect for all levels of 
question difficulty. This is because priming does not have the same effect for easy as 
it does for hard questions. Hard questions are influenced much more by priming, since 
easy ones would be attempted anyway. This correction procedure (raising the criterion) 
doss not completely counteract the priming manipulation, and therefore, hard questions 
still show a bias In feellng-of -knowing; however, since priming was never needed for easy 
questions, this correction lowers estimates for primed* easy questions. 

This explanation is consistent with the theory, discussed earlier, that people are 
sensitive to contextual variafc'ts in the testing situation, e.g., notice the effects of the 
priming manipulation. We know that people are capable of rapidly altering their strategy, 
it seems that subjects are trying to alter the strategy that they would otherwise select on 
the basis of such strategic decisions as 'this probe contains terms that were in tho 
rating task-therefore i should underestimate how easy it would be to answer/ 
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General Discussion: Putting the Two 
Strategy-Selection Processes Together 

This paper has argued that there exists a strategy selection stage and suggested 
two types of processes that are involved in the selection: one type is strategic processes 
that evaluate situational or contextual information and the other type is less conscious 
processes that quickly evaluate how familiar the question seems. Experiments 1,2, and 
3 showed that we are quite sensitive to extrinsic factors when selecting a strategy. 
Experiments 4 and 5 showed that we 'evaluate" sentences fast enough for Initial 
evaluation of a question to be part of the strategy selection process. Experiment 6 
indicated that recency of exposure to words influences our "feeling-of -knowing" process, 
postulated to be Involved In strategy-selection. 

Both types of processes generate a bias for the- strategy to be selected. If the 
two biases are In conflict, the stronger bias prevails. An example of the blending of 
these two processes rones from the work of Centner and Collins (1981). They found 
that people are more likely to decide that an assertion is implausible as the assertion's 
importance and their own expertise relevant to the assertion increase. This is the "I 
would know this fact If it were true" phenomenon. If a person knows a lot about an 
area (strategic Information), and there is no convergence of activation from the concepts 
in the assertion (initial evaluation), a person may be more willing to make a quick "no" 
response. Alternatively, if there is some convergence of activation, then knowing more 
about an area might lead one to spend more time searching for the information (Collins, 
personal communication). 

How does the framework relate to current views of quest lon-enswerlng? 

One of the major Influences on theories of question-answering conres from the Cognitive 
Science Group at Yale, (e.g.. Schank. 1982: Reiser. Black & Kalamarldes. In press: 
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Reiser, Black & Abelson, 1985; Kolodner, 1983, 1984). Schank's (1972) original view 
was that all inferences are computed 'on-line* during comprehension. In this way, when 
a question is asked that does not tap information explicitly presented, the answerer can 
look at the memory representation of the input and directly retrievo the information since 
it was already inferred (e.g., Schank, 1972, 1975; Schank & Abeteon, 1977). Lehnert's 
(1978) work on question-answering involved inferential mechanisms at the time of test; 
however, these were not tc compute the answer to the question, per se. Rather, the 
inference was to determine the true intention of the question-asker (e.g., some questions 
are really requests, as in * it is chilly in here, don't you think?* means 'please close the 
window 1 ") or to figure out what level of answer is appropriate (e.g., would a yes or no 
answer be enough?). In her model, the answering mechanisn can still be thought of as 
direct-retrieval once the question is properly determined. Graesser (1981) also seems to 
believe that large quantities of inferences are constructed during comprehension and that 
these will be used for question-answering if available. In other words, none of these 
theories consider that plausible reasoning could be a preferred question-answering 
strategy. 

Singer & Ferreira (1983) do not assume that 'all* plausible inferences are made in 
advance. Nonetheless, they too believe that if an inference is made *on-line,* that it 
will be retrieved. Neither a stored inference nor a presented statement would ever be 
verified by inference at time of test If it could be found in memory. Test statements 
that were not inferred during comprehension would have to be computed, but that is the 
less preferred strategy. The model proposed here is different from both of these points 
of view. Regardless of whether or not the inference was computed "on-line" during 
comprehension, there is a very strong possibility that a person will not bother to try to 
find it instead, people will often just try to compute or recompute whether or not an 
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49 



In addition, Kolodner (1980) has looked at the retrieval from memory of personal or 
^episodic" (Tulving, 1972) information. Reiser (1983) has also looked at retrieval of 
personal memories and has developed a model similar in spirit to Kolodner's. He has 
also gathered empirical support for his model (Reiser, 1983; Reiser, Black & Abelson, 
1985). The type of memory queries that their theories address concern events, e.g., 
"did you ever go swimming in a river in Ohio?* in their framework inferential 
mechanisms are required to answer questions; however, in this case, the Inferencing 
must be done to extract the relevant search contexts and Infer whether an item not 
found during initial search can be found with further search. Their view is that one 
memory retrieval can provide cues for a subsequent retrieval. This is similar to ideas 
described by Norman and Bobrow (1979), Williams and hollan (1981), and Williams and 
Santos-Williams (1980). In their research too, memory retrieval was viewed as a 
recursive, reconstructive, problem-solving process. That is, search is a cycle of 
specification, matching, and evaluation that continually refines the descriptions of the 
items sought in light of the evaluation. Search for an item retrieves partial information, 
which is used to build a more complete specification of the target to guide further 
searches. 16 

The successive refinement strategy for finding information in memory proposed by 
Kolodner and Reiser could be thought of as yet another mechanism that can be used to 
answer questions. If this type of strategy were selected for answering the question, the 
answerer would have to predict a plausible memo>y location for an event that might have 
the relevant target features. When the features sought are not found there, a new 
location is tried or an assessment is made of the probability of finding the required 
information. This type of strategy was not appropriate in the experiments I reported 
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because the information sought was (a) not autobiographical, and (b) relatively recent. 
Indeed, Kolodner and Reiser also distinguished "Question-Answering"' from their much 
harder memory search, where the former is considered appropriate to answering 
questions from stories. For example, Kolodner states that search for relevant contexts Is 
not needed for question-answering of stories read recently. 

The framework developed In this paper can be expanded to Include this other type 
of question-answering task. During the first stage that measures "feeling of knowing," a 
decision that some relevant Information can probably be found would bias Stage 2 
against using the direct-retrieval strategy because the terms In a probe of this kind 
would not register as having been seen recently. Because the question deals with a 
specific episode or event, an evaluation would be made that the- "successive-refinement" 
search strategy would be needed. Before going on to search for the relevant 
information, an Inference would be made as to the appropriate first context to search. 

How does the current framework relate to other areas of cognitive-processing? 

One Idea developed In this paper is that we have a decision mechanism to select 
the dominant strategy to answer a question. This Idea Is In contrast to some of the 
extant models of cognition, e.g., Anderson's (1983) ACT* model where productions can 
not be given probabilities of firing. Productions are either part of the goal set or they 
are not. It Is not obvious how the production sets or goals would be easily reordered 
in dominance, based on advice given from trial to trial. It Is also not obvious how these 
ideas would be Implemented by theories which propose that processing Is Implemented 
in massively parallel architectures (e.g., McClelland, RumelharT & Hlnton, (In press); 
Fahlmann, Hlnton & Sejnowski, 1983). As yet, these models lack any obvious 
mechanisms for allowing a person to prefer one process to another, varying from trial to 
trial. 
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The mechanisms for deciding which strategy to select, or how to allocate attention 
among competing question-answering strategies, may be quite similar to those 
mechanisms that allow a person to allocate attention In a dual-task situation. There is a 
large body of literature concerned with allocation of resources when attempting to 
perform multiple tasks. * When driving a car, the amount of resources devoted to driving 
as opposed to a second task such as conversing with a passenger can shift as the 
attention demands of the driving task change. 

Just as "feeling of knowing" can affect what strategy Is selected in question- 
answering, • feeling of competence* could affect how much attention is devoted to one 
task as opposed to another. The controlled, strategic processes that are Influenced by 
external factors In question-answering could also monitor the feedback from the 
environment In the dual-task setting to see whether the tasks are being performed well 
enough (or one too well, and the other not well enough). Advice such as "watch the 
road" or "listen to this argument for why my theory Is better" may cause the allocation 
of resources to shift, just as the advice to "try to find the fact In memory" or "try to 
Infer the answer" can affect question-answering behavior. 

A considerable portion of the research on attention concerns allocation of resources 
In a dual-task situation, (e.g., Navon & Gopher, 1979, 1980; Wnchla, 1980; Wlckens, 
1980; Moray & Fitter, 1973; Moray, 1967; Shaw, 1980; Norman & Bobrow, 1979; 
McLeod, 1977; Sperling & Melchner, 1979; Schneider & Shlffrln, 1977; LaBerge, 1975; 
Gopher & North, 1974, 1977), Some of this research is concerned with the issue of 
whether or not the processes for dual tasks operate in parallel or whether they occur 
sequentially, whether the subject is strategic in resource allocation between the two 
tasks, and whether time-sharing or resource allocation abilities are learned and can be 
improved, (e.g., Fisher, I975(a,b), 1977, 1980; Schweickert, 1978, 1980. 1983; Shaw. 
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1980; Lane, 1980: Damos & Wickens, 1980). 



Work by Posner, Nissan and Ogden (1978) gave impetus to the idea that we can 
allocate ettention differentially using strategic information. They looked at how 
response tines varied to the onset of a light as a funciion of the validity of a directional 
cue. Subjects performed feiter with a valid cue (correct advice as to which position 
would light % ha<~ when not given a cue, ano performed slower with an Um 4 cuo. 

In this "iteraiure, the strategic components that allocate resources are called "time- 
sharing 11 skills. Lane (1980) and Damos and Wickens (*P80) have looked at the 
acquisition of time-sharing skills. Lane found that the differe e between central and 
incidental task performance increases with age, and that correlation between the two 
tasks deer ases with age. However, unlike the common assumption that such 
developmental differences are due to acquired ability to process information selectively, 
they may be due to capacity trade-offs, i.e., devoting most of the processing resources 
to on* of the dual-tasks. Damos and Wickens found that timesharing skills are learned 
with practice and can transfer to new dual-task combinations. *avon and Gopher (1979) 
also found that with extensive practice, subjects seemed able to achieve higher levels cf 
performance on both tasks. Ona possibility they considei is that the processes involved 
in the allocate of resources get better with practice. 

Extensions 

Although the strategies involved in other cognitive tasks are undoubtedly different 
from those for question-answering, it is interesting to conjecture on the similarity of the 
factors affecting strategy selection. Consider, for example, mathematics. We can first 
assess our familiarity with a problem. Certain classes of problems we recognize that we 
can work out without consulting a math book, while others require looking up the formula 
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for solution. Even within multiplication we may assess how familiar a problem is before 
seletfing a procedure for solution. Common but complex problems such as 12 x 12 are 
recognized ss directly stored, but 16 x 12 might be evaluated as one thai should be 
broken down into subcomponents to solve. Siagler and Shrager (1984) have evidence 
that children do similar things for arithmetic although they apparently always try direct 
retrieval before computing the answer. 

In addition to familiarity evaluation, it is probably also the case that in many 
domains prior history of success with a strategy influences selectlor of the strategy. 
Again using mathematics as an example, consider the task of solving Integrals. If one 
integration technique has worked for many problems, one is likely to try it again unless 
the initial evaluation suggests some other procedure. The Einstellung effect in solving 
water jug problems (Luchins, 1942) is an example of where prior history of success with 
a (solution) strategy adversely affects strategy selection. The Einstellung effect is 
eliminated for approximately 50% of the subjects when they are simply instructed "Don't 
be blind " This is understandable since "prior history of success with a strategy" is 
part of a selection mechanism under conscious control. 

For almost any complex cognitive task there are multiple strategies that can be 
used for solving it, typically one which is more appropriate than another. As people 
become proficient in performing the task, they also become relatively proficient in 
selecting the best strategy to use to* a particular instance. This paper has provided a 
framework for understanding the role of stratef , selection In question-answering and has 
suggested what variables affect the selection process. 
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Notes 

1 For example, it is common for n.emory theorists to assume that Statement A was 
inferred during the reading of a text if verification of it is faster than verification of a 
different Statement 0, (e.g., Garnham, 1982; Singer & Ferrelra, 1983). Another 
interpretation, of course, Is that statement A is more plausible than statement B and 
therefore faster to verify (judge plausible) at test, regardless of which statements had 
been inferred earlier. 

2 Just as a presented plausible can be verified by either strategy, a contradictory 
statement can be rejected by either strategy, viz., finding its exact contradiction in 
memory and noting the discrepancy or by inferring that it is implausible. Note that 
failing to find a fact using the direct retrieval strategy is not a valid basis for deciding 
that a fact was Implausible. 

3 Note that the difference in response times Is not due to different responses to 
not-stated items: Both groups are expected to respond affirmatively to not-stated 
plausible probes. 

*This issue will be discussed at length later in the paper 

s 

5 The only probo that had lower accuracy when "stated" in the story was the 
contradictory statement when direct-retrieval was advised. Perhaps this results from 
subjects not bothering vj note the one opposite word in the almost-verbatim match of 
the memory trace to the contradictory probe. That is, the direct-retrieval strategy is a 
"literal" match strategy and sometimes subjects are sloppy and do not check every 
word. 

6 ANOVAs using only the correct answer times and those including impossibles look 
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very similar and do not change any interpretations. 

7 \x is unclear why the differences were not manifest in RTs. One explanation is 
that the difference among levels of difficulty was really probability of being able to 
answer the question, not difficulty of answering, per se. Consider, for example, one of 
the difficult questions, "Bowie Kuhn is (was) commissioner of what sport?". It is 
probably not difficult for those who know the answer (baseball). If a person knows the 
answer, it can be retrieved as fast as many questions considered much easier, e.g., 
"How many eggs are in a dozen?". The difference among the question types then 
would be seen only in measures such as the percentage of question: attempted. 

8 The pay-off formula was: 3.8/(0.l5*rt+0.4)-3.7, rounded to the nearest half point. 
In the Estimate Group, subjects were given an additional 3 points for a correct response 
and lost 3 points for an incorrect response. In the Answer Group, they received the 
same pay-off for accurate and Inaccurate responses so long as they were given in less 
than one second after the button press; if it took longer than that to speak the answer, 
subjects lost 4 points for a correct answer and 6 for an incorrect. 

9 The reason why the RTs for the first phase are so much longer than the second 
phase is that the first phase RTs include the reading time of the question. 

10 A common word like "table" would have to be much more active than a rare 
word like "hippopotamus" for the feeling-of-knowing process to recognize it as having 
been seen recently. See, for example, Just & Carpenter, (1980) or McClelland & 
Rumelhart, (1981) for a fuller discussion of similar assumptions. 

"Some questions such as "what is Dickens' phone number?" may seem to be 
rejected from this initial evaluation; however, it is more likely that the decision is made 
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to use a particular question-answering strategy, and then it is determined that the 
question is un-answerable. 

l2 The terms were sometimes longer than one word, e.g., "Lady Godlva', "Howdy 
Doody". 

13 Subjects were significantly faster and more likely to attempt answers in the 
pr-,d conditions, for both the estimate and the answer task, although much more so for 
the estimate task. The faster juigments did no; Interact with other variables. Although 
it is possible to construct explanations for the modest differences across conditions. It Is 
more likely that any differences are due to subject effects, and dwelling on these slight 
differences would confuse the picture. 

1 *The means In the Tables reflect the obtained scores without the estimates for 
missing or undefined observations. The F statistics vary depending on the estimation 
procedure used for the missing observations. The Fs In question are the response- 
times. For example, if a subject did not attempt any difficult primed questions, then an 
estimate of the time for that subject to attempt a difficult, primed question was required. 
The Fs reported were the most conservative, I.e., gave the smallest Fs, of the various 
estimation procedures. Proportion attempted, on the other hand, was not affected since 
there were no missing observations. Therefore, those statistics are not subject to 
interpretation. 

15 Although this Interaction was not reliable, it was reliable with other procedures 
used for estimating missing data. 

16 The proposal that one retrieval can facilitate another retrieval has also appeared 
in models of free and cued recall, e.g.. Shiffrin. (1970); Raaijmaker* and Shlffrin, (1981). 
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Table 1 
Example Story 

We ar iut of touch with problems which were central in tne past. 

But this is not true everywhere. 

The setting is Burma. 

The tiger was a man-eater. 

It suffered from an old gunshot wound. 

The villagers did not dare work in *he fields. 

In a disorderly meeting they made a decision. 

They asked a hunter to help them. 

He came ihe following week. 

The tiger killed a man. 

It had attacked him in a small ravine. 

It carried the victim away. 

It concealed the kii! under some vines. 

The hunter followed Qu. tiger's trail. 

The traces were distinct. 

He found the tigor asleep. 

The shade was cool there. 

The humor considered giving the tigor a sporting chance. 

Then he shot it. 

The tiger died quietly. 

The hunter did lot feel right. 

The villagers understock his feelings. 

But their concern was practical. 

The hunter skinned the tiger. 

He left with the sKin. 



Statements to Judge* 

Highly Plausible: 

The tigv. was dangerous. 

The villagers were afraid of the tiger. 

The villagers were happy that the tiger was dead. 

Moderately Plausible: 

The huntor was expected to solve their problem. 
The hunter thought of the tiger's situation. 
The hunter used his best judgement. 

Implausible: 

The villagers had encountered many man-eating tigers. 

The hunter was well paid for killing the tiger. 

The villagers make their living primarily by hunting. 

There are no guns in Burma 

Tie villagers are Hindu and do not believe in killing. 
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The village chieftain vore ihe tiger skin over his hips. 
Contradictory: 

The tiger was awake when the hunter found him. 

The villagers live in Nepal. 

The hunter left the tigers pelt with the villagers. 

The tiger died fighting. 

The villagers scorned the hunter's feelings. 

The tiger concealed his victim in a cave. 



'Although 6 implausible and 6 contradictory statements are listed here, only 6 of the 
possible 12 were selected for any story. This ensured an equal number of true and false 
test statements. 
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Table 2 



Mean Response Times (and Accuracy) to Make Verification Judgments in 

Experiment 1 



Biased: Stories 1-6 



BIASED FOR DIRECT RETRIEVAL BIASED FOR PLAUSIBILITY 

(80% STATED) (80* NOT-STATED) 





Stated 


Not-Stated 


Stated 


Not-Stated 


Highly 


2.12 


3.13 


2.56 


2.60 




(.92) 


(.88) 


(.90) 


(.90) 


Moderately 


2.27 


3-13 


2.84 


3.21 




(.90) 


(.77) 


(.92) 


(.74) 


Ieplaus/Contra 


2.57 


2.80 


3-11 


2.96 




(.82) 


(.84) 


(.86) 


(.89) 




Returned to 50% Stated: 


Stories 7-10 




Highly 


2.15 


2.31 


2.30 


2.57 




(.95) 


(.91) 


(.98) 


(.95) 


Moderately 


2.31 


3.02 


2.33 


2.97 




(.94) 


(.?3) 


(.96) 


(.82) 


Iaplaus/Contra 


2.31 


2.49 


2.61 


2.90 




(.86) 


(.92) 


(.89) 


(.90) 



*The implausible statements were never presented In the story. The 
contradictory stateneots contradicted an explicitly presented statement 
by substituting an opposite for one word in tne statement. 
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Table 3 



Mean Response Times and Accuracy For Judgments in Experiment 2 



OFFICIAL TASK RECOGNITION PLAUSIBILITY 





% CORRECT 


RT 


% CORRECT 


RT 


IMMEDIATE 










Mod. Stated 


90.56 


2.103 


93.00 


1.964 


High Stated 


93-90 


2.125 


94.47 


1.849 


Iiplausible 


96.06 


2.178 


89.86 


2.300 


DELATED 










Med. Stated 


88.54 


1.979 


91.09 


2.320 


High Stated 


90.00 


1.889 


94.89 


2.054 


Iiplausible 


91.24 


1.983 


88.P7 


2.519 
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Table 4 

Mean Response Time and Accuracy for Verification Judgments as a 
Function of Advice Type, Plausibility and Whether the Probe Had Been 

Presented in the Story 



INFERENCE ADVICE DIRECT RETRIEVAL ADVICE 





% CORRECT 


RT 


% CORRECT 


RT 


Inf.(+) Dir. ret.(-) 
Med, not stated 
High, not stated 
Tiplauaiblc 


74.54 
92.81 
80.46 


2.706 
2.268 
2.530 


82.35 
91.18 
84.90 


3.281 
2.214 
2.738 


Dir. ret.(+) Inf.(-) 
Med, stated 
High, stated 
Contradictory 


92.16 
98.04 
84.12 


2.444 
2.101 
2.683 


93-46 
99.35 
76.41 


2.049 
2.001 
2.501 
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Table 5 



Time tc Estimate or Attempt Answers, Proportion of Questions 
Attempted, Proportion Answered Correctly and Accuracy of Attempts as a 
Function of Task and Question Difficulty, Experiment 4. 



Question 
Difficulty 


Tiae 
to Atteapt 

EST ANS 


Proportion 
Atteapted 

EST ANS 


Total 
Correct 

EST ANS 


Accuracy 
of Atteapt 

EST ANS 


Easy 

Moderate 
Hard 

Impossible 


1.650 2.512 
1.828 2.654 
1.728 2.388 
1.524 2.925 


86.02 91.62 
67.46 72.29 
42.38 47.09 
7.56 25.13 


80.62 
59.48 

34.13 
0.00 


80.35 
54.97 
28.42 
0.00 


93-60 
87.98 
81.44 
0.00 


87.54 
76.08 
59.07 
0.00 


All 

Attempted* 


1.735 2.518 


65.29 70.34 


58.08 


54.5* 


87.6/ 


74.22 






ESTIMATE 






ANSWER 




RT for Answered Correctly* 
RT for Answered Incorrectly* 
RT for Not Attempted* 


1.722 
2.069 
1.851 






2.396 
3-009 
3-197 





•These Mans do not Include the Impossible Items, 
similar with their Inclusion. 



The pattern Is quite 
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Table 6 



Mean Proportion Attempted, Correct, Accuracy and Response Times to 
Attempt Correct Answers, Give Answers, as a Function of Task and 
Question Difficulty in Experiment 5 



RT1: Time Proportion Total Accuracy 

to Atteapt Attempted Correct of Attempt 

Question 



Difficulty 


EST 


ANS EST 


ANS 


EST 


ANS 


EST 


ANS 


Easy 

Moderate 
Bard 

Impossible 


1.385 
1.447 
1.427 


1.580 84.13 
1.671 68.69 
1.775 40.47 
6.68 


85.85 
68.94 
37.51 
5.00 


78.70 
60.67 
32.69 


79.89 
59.83 
29.96 


93.27 
88.43 
80.98 


95.99 
88.40 
80.34 


All 

Attempted* 


1.420 


1.675 64.43 


64.10 


57.35 


56.56 


87.56 


88.24 




RT2 

EST ANS** 


RT1 + 

EST 


RT2 
ANS** 








Easy 

Moderate 

Hard 

Impossible 


0.590 0.304 
0.595 0.274 
0.541 0.317 


1.975 
2.042 
1.968 


1.884 
1.945 
2.092 








All 

Attempted* 


0.575 0.298 


1.995 


1.974 









-Undefined 

•These Means do not include the iipossi^le items. 

**RT2 in Answer Condition does not include late responses 

(less than \% of data). 
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Table 7 



Proportion of Questions Attempted, Time to Attempt Answer or Time to 
Say "Don't Know" as a i-unction of Task, Question Difficulty and Priming. 



ESTIMATE 





PROP. ATTEMPTED 


RT TO ATTEMPT 


RT TO NOT ATTEMPT 




UNPRIMED 


PRIMED 


UNPRIMED 


PRIMED 


UNPRIMED PRIMED 


Easy 


.82 


.76 


2.41 


2.32 


3.80 3-45 


Moderate 


.50 


.52 


2.79 


2.58 


3.05 3-34 


Hard 


.24 


.31 


2.72 


2.70 


2.80 2.75 " 


ANSWER 




PROP. ATTEMPTED 


RT TO ATTEMPT 


RT TO NOT ATTEMPT 




UNPRIMED 


PRIMED 


UNPRIMED 


PRIMED 


UNPRIMED PRIMED 


Easy 


.77 


.81 


3.42 


3.64 


7.56 6.76 


Moderate 


.54 


.48 


4.58 


4.67 


6.04 8.30 


Hard 


• 39 


•38 


5.10 


5.47 


4.59 5.51 
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Figure Captions 
Figure 1. Flowchart model of strategy selection from Reder(l982). 

Figure 2. Difference in RT between highly and moderately plausible statements (collapsed 
over stated/not-stated) as a function of strategy-bias, when bias imposed (stories 1-6) 
and not imposed (stories 7-10) in Exp. 1. 

Figure 3. Difference in RT between stated and not-stated statements (collapsed over 
plausibility) as a function of strategy-bias, when bias imposed (stories 1-6) and not 
imposed (stories 7-10) in Exp. 1. 
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Process Model 
for Making Plausibility 




decide decide 
plausible implausible 




"yeg»» "no" 
Tim>: A+G Time: A+G+l 

FIGURE 1 
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